2 Local and Almost Linear-time Clustering and Partitioning 2.1 Review of Local Clustering 2.2 General Strategy

ثبت نشده
چکیده

1 Administrivia You should probably know that • the first problem set (due October 15) is posted on the class website, and • its hints are also posted there. Also, today in class there was a majority vote for posting problem sets earlier. Professor Kelner will post the problem sets from two years ago, but he reserves the right to add new problems once a problem set has already been posted. Questions from last time. • What is a level set? The level set of a function corresponding to a (fixed) constant c is the set of points in the function's domain whose image equals c. • What is a good reference on applications of expander graphs? A course taught by Nathan Linial and Avi Wigderson [3]. Plan for today. We use what we proved last time to obtain a local clustering algorithm from a random walk scheme. Then, noting that similar results to the ones proved last time also hold for PageRank, we obtain a second scheme that yields a second, better local clustering algorithm. Finally, we briefly motivate the technique of sparsification, which we will discuss next time. Let us briefly review local clustering, which we introduced last time. Given a vertex v in some graph G, we would like know if v is contained in a cluster, i.e. a subset of vertices that defines a cut with low conductance. However, we want the running time of our algorithm to depend on the cluster size, and not on the size of the graph. Last time we mentioned that a good example of a problem of this sort is trying to find a cluster of web pages around mit.edu; we surely do not want the running time of this task to depend on the number of sites created on the other side of the world. Let us make our goal a little bit more precise: in this lecture we will describe an algorithm that, after running for time almost linear in K, outputs a cluster of size at least K/2 around the starting vertex, if such a cluster exists. We observe that if we run a random walk starting from some vertex v contained in a cluster, then low-conductance cuts will be an obstacle to mixing; i.e., the random walk has trouble leaving the cluster. Hence, a good guess for the cluster is the set of vertices with the …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS

Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...

متن کامل

Partition - Based Clustering in Object Bases : From Theory to

We classify clustering algorithms into sequence-based tech-niques|which transform the object net into a linear sequence|and partition-based clustering algorithms. Tsangaris and Naughton TN91, TN92] have shown that the partition-based techniques are superior. However , their work is based on a single partitioning algorithm, the Kernig-han and Lin heuristics, which is not applicable to realistica...

متن کامل

Tabu-KM: A Hybrid Clustering Algorithm Based on Tabu Search Approach

  The clustering problem under the criterion of minimum sum of squares is a non-convex and non-linear program, which possesses many locally optimal values, resulting that its solution often falls into these trap and therefore cannot converge to global optima solution. In this paper, an efficient hybrid optimization algorithm is developed for solving this problem, called Tabu-KM. It gathers the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009